Nature's Kidneys: the Role of Wetland Reserve Easements in Restoring Water Quality
Nicole Karwowski and Marin Skidmore
Replication File
2025

This file details how data and programs are combined to generate the final analysis, tables, and figures. 

################################################################################################################################################

Citations to all datasets used in the analysis

Esri. 2023. USA Rivers and Streams. ArcGIS Hub. Accessed August 25, 2025. https://hub.arcgis.com/datasets/esri::usa-rivers-and-streams/about.

Krasovich, E., P. Lau, J. Tseng, J. Longmate, K. Bell, and S. Hsiang. 2022. Standardized Nitrogen and Phosphorus Dataset (SNAPD). HydroShare. https://doi.org/10.4211/hs.9547035cf37940eb9b500b7994a378a1.

LaFontaine, J.H., and D.L. Blodgett. 2022. Twelve-Digit Hydrologic Unit Total Flow and Snowmelt from the National Hydrologic Model Infrastructure with the Precipitation-Runoff Modeling System, 1980–2016 [tabular digital data]. Accessed August 25, 2025. https://doi.org/10.5066/P902KHSM.

PRISM Group, Oregon State University. Gridded Weather Dataset. https://prism.oregonstate.edu. Data created December 15, 2023.

U.S. Department of Agriculture, Natural Resources Conservation Service. 2023. NEST Report. Obtained via FOIA request.

U.S. Department of Agriculture, Natural Resources Conservation Service. 2023. NRCS Conservation Easements Dataset. Geospatial Data Gateway. https://datagateway.nrcs.usda.gov/.

U.S. Department of Agriculture, National Agricultural Statistics Service. 2023. Cropland Data Layer. USDA NASS. Accessed January 9, 2023. https://nassgeodata.gmu.edu/CropScape/.

U.S. Department of Commerce, U.S. Census Bureau, Geography Division/Cartographic Products and Services Branch. 2018. County Shapefiles. U.S. Department of Commerce. Accessed August 25, 2025. https://www.census.gov/geographies/mapping-files/time-series/geo/carto-boundary-file.html.

U.S. Environmental Protection Agency. 2024. Public Water System Violations Data. Environmental Compliance History Online (ECHO). Accessed March 21, 2024. https://echo.epa.gov/.

U.S. Environmental Protection Agency, National Center for Environmental Economics. 2022. Public Water System Intake by HUC12, Q2 2022.

U.S. Environmental Protection Agency. 2025. 2024 TRI Preliminary Dataset: Basic Plus Data Files. Accessed August 25, 2025. https://www.epa.gov/toxics-release-inventory-tri-program/tri-basic-data-files-calendar-years-1987-present.

U.S. Environmental Protection Agency. 2025. ICIS-NPDES Download Summary and Data Element Dictionary. Accessed August 25, 2025. https://echo.epa.gov/tools/data-downloads.

U.S. Environmental Protection Agency. Dasymetric Allocation of Population 2010 CONUS. EnviroAtlas. Accessed August 25, 2025. https://www.epa.gov/enviroatlas/dasymetric-toolbox.

U.S. Geological Survey. 2023. Watershed Boundary Dataset. Accessed August 25, 2025. https://www.usgs.gov/national-hydrography/watershed-boundary-dataset.

U.S. Geological Survey. 2022. Land Change Monitoring, Assessment, and Projection (LCMAP) Collection 1.3 Science Products for the Conterminous United States: USGS Data Release. https://doi.org/10.5066/P9C46NG0.

##################################################################################################################################################
#Description of Contents
There are four different folders in the Replication Package. 

The first is the Code Folder. Here you will find the R scripts and STATA do files that will process the data. 
The Code folder is broken up into three subfolders: 1_Clean_Data, 2_Merge_Data, and 3_Analyses_HUC12. 
The second folder is called Data. It contains the data sets in their Raw form plus the cleaned version in the Processed Folder. The "Main" data folder has the final dataset. 
The third folder is called Figures and includes the Figures that are in the main paper. 
The fourth folder is called Results and includes the tables and coefficient plots in the main paper. 
The final folder is called Supplementary Appendix and has the supplementary tables and figures. 

###################################################################################################################################################
#Step-by-step instructions for Replication
Make sure at the beginning of each script to set the working directory to your specific directory where the Replication materials are saved. 
Then run the following scripts in the specified order:

###################################################################################################################################################
#Part I: Cleaning the Data
#1_Clean_Data/1_Water_quality/1_SNAPsite_huc12: 30 minutes
This script loads HUC-12 watershed boundary data and water quality monitoring site coordinates, ensures they share a common coordinate reference system, and cleans the HUC-12 data by selecting relevant variables and converting numeric fields. It then spatially links each monitoring site to its corresponding HUC-12 unit, creates a merged dataset of sites with their watershed attributes, and saves the processed file.

#1_Clean_Data/1_Water_quality2_SNAPwq_huc12: 30 minutes
This code merges water quality monitoring data with HUC-12 watershed identifiers, cleans the data by removing missing values and outliers, and creates time variables for year and month. It then aggregates nutrient concentrations by HUC-12 and month, reshapes the dataset from long to wide format to create a panel dataset, and saves the processed file.

#1_Clean_Data/2_Wetland_easements/1_WRP_details_huc12: 15 minutes
This code cleans and standardizes detailed NRCS easement data, fixing missing values and dates, filtering for permanent easements, and preparing consistent variables on acres and crop acres. It then links the easements to HUC-12 watersheds, aggregates applications, closings, and restorations by year and month, and saves a panel dataset of easement activity at the HUC-12 level.

#1_Clean_Data/2_Wetland_easements/2_WRP_sf: 30 minutes
This script cleans and standardizes NRCS geospatial easement shapefiles by keeping only wetland-related programs, fixing invalid geometries, and creating consistent identifiers, date variables, and easement classifications. It then matches these spatial easements with detailed records through multiple merging strategies to recover restoration dates, imputes missing values where necessary, and saves a finalized cleaned shapefile.

#1_Clean_Data/3_Controls/0_MARB_huc12_sf: 20 minutes
This script builds a watershed shapefile for the Mississippi/Atchafalaya River Basin (MARB) study area. It identifies HUC12s with water quality data, extracts those polygons from the national watershed boundary dataset, assigns IDs, checks spatial extent, and saves the MARB subset as a new shapefile.

#1_Clean_Data/3_Controls/1_PRISM_huc12: 30 minutes
This script downloads monthly PRISM precipitation and temperature data (1980–2018), extracts mean values for each HUC12 watershed using exact polygon overlays, and reshapes the results into long-format datasets. It then saves the aggregated precipitation and temperature time series for each HUC12. 

#1_Clean_Data/3_Controls/2_CDL_huc12: 45 minutes
This script extracts annual Cropland Data Layer (CDL) land cover classifications for each HUC12 watershed in the MARB region from 2008–2018. It then merges, recodes, and aggregates the data into consistent land cover categories, producing a panel dataset of yearly land use proportions by watershed.

#1_Clean_Data/3_Controls/3_ECHO_pointpermits_huc12: 30 minutes
This script reads in annual EPA ECHO datasets on nitrogen and phosphorus discharges (2008–2018), cleans and standardizes the variables, and combines the yearly files into unified panels. It then aggregates the data to count the number of permitted nitrogen- and phosphorus-discharging facilities per HUC12 watershed per year, saving the resulting panel datasets.

#1_Clean_Data/3_Controls/4_County_huc12: 15 minutes
This script creates a spatial linkage between HUC12 watersheds in the Mississippi/Atchafalaya River Basin (MARB) and U.S. counties. It overlays the watershed and county shapefiles, calculates intersection areas, and generates a key that records each HUC12’s proportional overlap with counties, enabling weighted aggregation of county-level data (e.g., animal units) to the watershed scale.

#1_Clean_Data/3_Controls/5_LCMAP_huc12: 45 minutes
This script processes annual LCMAP land cover raster data (1985–2018) to create a HUC12-level panel dataset for the Mississippi/Atchafalaya River Basin (MARB). It overlays each year’s land cover raster with HUC12 boundaries, calculates the fractional area of each land cover class within watersheds, saves yearly outputs, and finally combines them into a single long panel.

#1_Clean_Data/3_Controls/6_RiversStreams_huc12: 30 minutes
This script calculates the nearest river or stream to each HUC12 watershed in the Mississippi/Atchafalaya River Basin (MARB). It aligns coordinate systems, finds the closest feature from the National Rivers and Streams shapefile, computes the distance, and saves a HUC12-level dataset linking watersheds to their nearest river/stream attributes.

#1_Clean_Data/3_Controls/7_CensusAg_AnimalUnits: 10 minutes
This Stata script processes USDA NASS animal inventory data by importing cattle, hog, and chicken counts, converting them to standardized Animal Units (AU), and aggregating them at the county × year level. It then linearly interpolates missing years between census data to produce a complete county-year dataset of animal units.

#1_Clean_Data/3_Controls/8_CensusAg_AnimalUnits_huc12: 10 minutes
This R script cleans and merges USDA NASS animal unit data with HUC12 watersheds by first creating a full HUC12 × year panel (1997–2017) and linking each HUC12 to its corresponding counties. It then calculates the estimated animal units within each HUC12 based on county-level animal data and the proportion of the HUC12 area within each county, imputes missing values as zero for small or urban counties, and aggregates to produce a HUC12-year dataset of animal units.

#1_Clean_Data/3_Controls/9_population_huc12: 15 minutes
This R script extracts population estimates for HUC12 watersheds in the Mississippi River Basin by overlaying a dasymetric 2010 population raster. It crops the raster to the MARB extent, sums the population within each HUC12, and outputs a HUC12-level population dataset.

#1_Clean_Data/3_Controls/10_tri_release_huc12: 20 minutes
This R script processes U.S. EPA TRI data from 1987–2018, cleans and reshapes it, and aggregates chemical releases (ammonia, nitrate, phosphorus) to the HUC12 watershed level. Facilities are spatially joined to HUC12 polygons, and the resulting HUC12-year panel contains total and water-specific releases for each chemical.

###################################################################################################################################################
#Part II: Data Merging

#2_Merge_Data/1_huc12_data_merge: 45 minutes
This script merges all of the cleaned datasets to create the main dataset used for the analysis. 

#2_Merge_Data/2_cb_pws_merge: 10 minutes
This script merges two different datasets from EPA regarding the public water systems location and population serviced. This dataset is later used in the cost benefit analyses. 

###################################################################################################################################################
#Part III: Run Analysis

#3_Analyses_HUC12/0_summary_table_jaere: 5 minutes
This file creates the summary table in the paper. 

#3_Analyses_HUC12/1_figures_maps_jaere: 20 minutes
This file creates the figures and maps in the paper.

#3_Analyses_HUC12/2_main_results: 30 minutes
This script creates the main tables and coefficient plots in the paper. 

#3_Analyses_HUC12/3_supplement:45 minutes
This script creates the tables and figures in the online supplement.

####################################################################################################################################################
#List of all output files associated with manuscript results and appendices
####################################################################################################################################################
#Figures

Figure 1: jaere_maps_nutrients_wrp_revised.eps
Figure 2: es_main_controls_sa.eps
Figure 3: ag_combined_plot.eps
Figure 4: payback_graph.eps
Figure 5: jaere_maps_benefits_byn_one.eps, jaere_maps_benefits_byn_all.eps
Figure 6: jaere_maps_benefits_bypws_one.eps, jaere_maps_benefits_bypws_all.eps
Figure B1: wetland nutrient cycle
Figure B2: cumulative_wetland_acres.eps

#####################################################################################################################################################
#Results
Table 1: summary_table.tex
Table 2: both_es_att.tex
Table 3: ammonia_treated_huc12_FE.tex, phosphorus_treated_huc12_FE.tex, tkn_treatedhuc12_FE.tex
Table 4: boe_breakeven.xlsx, BOE_table.tex

#####################################################################################################################################################
Supplement figures and tables with corresponding output files: 
Figure D1:othernutrients_es1.eps
Figure D2: p_nutrients_es_sa.eps
Figure E1: a1es.eps, n1es.eps, p1es.eps
Figure E2: a1es_uc.eps, a2es_uc.eps, n1es_uc.eps, n2es_uc.eps, p1es_uc.eps, p2es_uc.eps
Figure E3: a2es_ihs.eps, a2es_uc_ihs.eps, n2es_ihs.eps, n2es_uc_ihs.eps, p2es_ihs.eps, p2es_uc_ihs.eps
Figure E4: a2es_12fe.eps, a2es_uc_12fe.eps, n2es_12fe.eps, n2es_uc_12fe.eps, p2es_12fe.eps, p2es_uc_12fe.eps
Figure E5: a2es_closed.eps, a2es_uc_closed.eps, n2es_closed.eps, n2es_uc_closed.eps, p2es_closed.eps, p2es_uc_closed.eps 

Table D1:ammonia_unfiltered_twfe.tex, tkn_unfiltered_twfe.tex, nitrogen_unfiltered_twfe.tex, phosphorus_unfiltered_twfe.tex, orthophosphate_filtered_twfe.tex, orthophosphate_unfiltered_twfe.tex
Table E1: ammonia_ag_percentile.tex, tkn_ag_percentile.tex, phosphorous_ag_percentile.tex
Table F1: ammonia_treatedhuc12_upstream_WQ.tex, tkn_treatedhuc12_upstreamWQ.tex, phosphorous_treatedhuc12_upstreamWQ.tex
Table F2: ammonia_allhuc12_FE.tex, tkn_allhuc12_FE.tex, phosphorous_allhuc12_FE.tex
Table F3: ammonia_weather.tex, tkn_weather.tex, phosphorous_weather.tex
Table F4: ammonia_sa_robust.tex, tkn_sa_robust.tex, phosphorous_sa_robust.tex
Table F5: ihs_ammonia_treatedhuc12_FE.tex, ihs_tkn_treatedhuc12_FE.tex, ihs_phosphorous_treatedhuc12_FE.tex
Table F6: ammonia_treatedhuc12_pct.tex, tkn_treatedhuc12_pct.tex, phosphorous_treatedhuc12_pct.tex

####################################################################################################################################################
#Software Requirements
R version 4.4.1 was used to conduct this project. 
STATA version 18.0 BE-Basic Edition was used to conduct this project. 

#R libraries needed
# Data manipulation & tidy tools
library(dplyr)
library(data.table)
library(tidyverse)
library(tidyr)
library(collapse)
library(purrr)
library(Hmisc)
library(stringr)
library(readr)
library(lubridate)
library(units)

# Spatial & mapping
library(sf)
library(raster)
library(terra)
library(mapview)
library(leaflet)
library(tmap)
library(tigris)
library(CropScapeR)
library(lwgeom)
library(sp)
library(nngeo)
library(rgeos)
library(cartography)
library(maps)
library(RColorBrewer)
library(htmlwidgets)

# Visualization
library(ggplot2)
library(ggthemes)
library(ggfixest)
library(cowplot)
library(broom)
library(patchwork)
library(Cairo)

# Econometrics / statistical modeling
library(fixest)
library(plm)
library(lmtest)
library(sandwich)
library(did)
library(stargazer)
library(tableone)
library(cobalt)
library(boot)
library(pvaluefunctions)
library(xtable)
library(spatstat.utils)
library(naniar)

# Data import/export
library(haven)
library(openxlsx)
library(readxl)
library(fst)
library(prism)
library(usdarnass)

